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(54) Method and apparatus for detecting a point of change in moving images 



(57) A system for detecting a point of change 
between video shots from a video having a plurality of 
succeeding frames. The system includes video play- 
back apparatus for playing a video chronologically one 
frame at a time, and a display for displaying the video. 
The a processing device for calculating a feature quan- 
tity of video image data for each frame, determining a 
first correlation coefficient between a feature quantity of 
a current frame and a feature quantity calculated from 
an immediately preceding frame and determining a sec- 
ond correlation coefficient between the feature quantity 
of the current frame and a feature quantity of at least 

FIG. 1 



two frames preceding the current frame, and indicating 
on the display a point of change between video shots 
when the first correlation coefficient and the second cor- 
relation coefficient are out of predetermined allowable 
ranges. The correlation coefficients of each frame is 
stored and can be used by the processing device to 
dynamically change a reference used for detecting a 
point of change between video shots. The change in the 
reference is performed based on the stored correlation 
coefficients or feature quantities of past frames. 
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Description 

CROSS REFERENCE TO RELATED APPLICATIONS 

This application is related to application Serial No. 
08/323,866, filed October 17, 1994, the disclosure of 
which is incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

The present invention relates to a video edit system 
and a video browsing method which are capable of 
locating the head of each video or movie shot included 
in a video having a plurality of video or movie shots, 
wherein each video or movie shot includes a series of 
uninterrupted video images photographed by one cam- 
era. More particularly, the present invention relates to a 
method and apparatus for detecting a point of change 
between video shots from a video stored on a video 
tape or video disk, wherein the video includes a plurality 
of succeeding frames. 

With computer operation speed growing faster and 
the capacity of memory increasing in recent years, 
attention is now being focused on database systems 
and presentation tools which can handle movie and 
video information. Such information can not be suffi- 
ciently handled with conventional systems. Searching 
for only a desired part of a vast amount of video infor- 
mation and editing it is a time-consuming work. 

Among the methods to alleviate editing work by 
using a computer a method disclosed in "Automatic 
Video Indexing and Full-Video Search for Object 
Appearances/ Joho Shori Gakkai Ronbunshi (Collec- 
tion of Papers, published by Information Processing 
Society). Vol. 33. No. 4. and Japan Patent Laid-Open 
No. 111181/1992 entitled "Change Point Detection 
Method in a Video" has been proposed. The above dis- 
closed method involves automatically dividing a video 
for individual shots, preparing a list of images each rep- 
resenting a shot, and using them like indices of a book 
to aid in searching and editing of the video. The above 
disclosed method allows one to grasp the contents of a 
video at a glance using the list and to locate a desired 
scene. It is also possible to handle the video in units of 
a shot, which is an appropriate length for handling, and 
thereby make rough editing easy 

To divide the video into shots requires detecting a 
point of change between video shots in a video, i.e., a 
change in appearance of a first video shot relative to a 
second video shot. A conventional method of detecting 
a point of change between video shots in a video, as 
described in the above-mentioned Japan Patent Laid- 
Open No. 111181/1992. is based on a technique that 
decides that there is a point of change between video 
shots, i.e.. a change in appearance between two suc- 
ceeding frames of a video (frames are still images mak- 
ing up the motion picture wherein in television 30 
images are displayed in each second) when there is an 
image difference between the two frames. 



The above disclosed method, however, has a draw- 
back in that it picks up as a point of change video shots 
having instantaneous disturbances of an image caused 
by strobe flashes produced when photographs are 

5 taken. These disturbances are such as those frequently 
encountered in press conferences, and by equipment 
trouble. Such image disturbances tend to occur succes- 
sively during one shot, resulting in unwanted division of 
one shot. In a period of a video in which generally dark 

10 images of a night scene continue, the image difference 
between succeeding frames tends to be smaller than 
that of bright scenes. Hence, if the same reference is 
used for both dark scenes and bright scenes in deter- 
mining a point of change, bright scenes may induce an 

is overly sensitive reaction resulting in an erroneous 
detection of the point of change, while in dark scenes 
the point of change may fail to be detected. 

SUMMARY OF THE INVENTION 

20 

The object of the present invention is to provide a 
method and apparatus to detect a point of change 
between video shots from a video consisting of a plural- 
ity of succeeding frames. 

25 Another object of the present invention is to provide 
a method and apparatus to detect a point of change 
between video shots from a video in such a manner to 
prevent erroneous detection of a point of change 
between video shots from a video due to an instantane- 

30 ous image disturbance. 

Yet another object of the present invention is to pro- 
vide a method and apparatus to detect a point of 
change between video shots from a video without 
degrading the detection sensitivity below that of the 

35 conventional technique. 

Still yet another object of the present invention is to 
provide a method and apparatus to detect a point of 
change between video shots from a video that can 
respond flexibly to a change in characteristics of the 

40 video. 

To achieve the above-described objects, the 
present invention provides a method having the steps of 
inputting a video chronologically one frame at a time 
into a processing device, calculating a feature quantity 

45 of video image data for each frame, determining a cor- 
relation coefficient between the feature quantity of a 
current frame and the feature quantity of an immediately 
preceding frame, determining a second correlation 
coefficient between the feature quality of the current 

so frame and a feature quality of at least two frames pre- 
ceding the current frame and indicating a point of 
change between video shots when the first correlation 
coefficient and the second correlation coefficient are out 
of predetermined allowable ranges. 

55 To further achieve the above-described objects the 
present invention provides a system for detecting a 
point of change between video shots from a video con- 
sisting of a plurality of succeeding frames. The system 
includes video playback apparatus for playing a video 
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chronologically one frame at a time, a display for dis- 
playing the video, and a processing device for calculat- 
ing a feature quantity of video image data for each 
frame, determining a first correlation coefficient 
between a feature quantity of a current frame and a fea- 5 
ture quantity calculated from an immediately preceding 
frame, determining a second correlation coefficient 
between the feature quantity of the current frame and a 
feature quantity of at least two frames preceding the 
current frame, and indicating on the display a point of 
change between video shots when the first correlation 
coefficient and the second correlation coefficient are out 
of predetermined allowable ranges. 

The present invention also provides a computer 
program product for use with a computer having a dis- 
play. The computer when executing the computer pro- 
gram detects a point of change between video shots 
from a video consisting of a plurality of succeeding 
frames. The computer program includes a computer 
readable medium with the computer program recorded 
thereon. The computer program includes a first code 
section for inputting a video chronologically one frame 
at a time into a processing device, a second code sec- 
tion for calculating a feature quantity of video image 
data for each frame, a third code section for determining 
a first correlation coefficient between a feature quantity 
of a current frame and a feature quantity calculated from 
an immediately preceding frame and determining a sec- 
ond correlation coefficient between the feature quantity 
of the current frame and a feature quantity of at least 
two frames preceding the current frame, and a fourth 
code section for indicating a point of change between 
video shots when the first correlation coefficient and the 
second correlation coefficient are out of predetermined 
allowable ranges. 

When there is an instantaneous image disturbance, 
there is a characteristic that an image difference 
between a disturbed frame and an immediately preced- 
ing or following frame is larger than an image difference 
between two frames having the disturbed frame 
between them. Thus, according to the present inven- 
tion, when a pattern of combination of correlation coeffi- 
cients representing such a state is detected, it is 
possible to correctly decide that the detected change is 
not a true point of-change between video shots. Con- 
versely, it is possible to detect only the part where such 
an image disturbance has occurred. Further, by dynam- 
ically changing the threshold for detecting the point of 
change between video shots according to the feature of 
the immediately preceding frame, it is possible to make 
a precise decision, according to the features of the 
video image, on whether or not the object being 
checked is a point of change, thereby minimizing erro- 
neous detection. 



BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be more apparent from 
the following detailed description, when taken in con- 
junction with the accompanying drawings, in which: 

Fig. 1 is a system block diagram of the present 
invention; 

Fig. 2 is a flowchart of a point of change detection 
computer program of the present invention that is 
robust or tolerant of instantaneous disturbances; 
Fig. 3 is a schematic diagram of a video having a 
part where an instantaneous disturbance has 
occurred; 

Fig. 4 is a schematic diagram of a video having a 
part where there is a normal point of change 
between video shots; 

Fig. 5 is a schematic diagram of a video having a 
part where disturbances have occurred over a plu- 
rality of frames; 

Fig. 6 is a flowchart of a point of change detection 
computer program in which the threshold is 
changed according to the features of a preceding 
video frame; 

Fig. 7 is a diagram showing a typical example of 
transition with time of the correlation coefficient; 
Fig. 8 is a flowchart of a computer program to deter- 
mine the brightness of the entire frame; 
Fig. 9 is a flowchart of a point of change detection 
computer program that combines the computer pro- 
grams of Figs. 2 and 6; 

Fig. 10 is a diagram showing a typical example of a 
voice signal; 

Fig. 1 1 is a diagram showing an example of a file 
structure; and 

Fig. 12 is a diagram showing an example screen of 
a video edit system. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Fig 1 is a block diagram of one type of system con- 
figuration that can implement the features of the present 
invention. Reference numeral 1 represents a display 
device that displays an output screen for a computer 4. 
Instructions for the computer 4 are entered from an 
input device 5 such as a keyboard and a pointing 
device, for example, a mouse. A video playback appara- 
tus 10 is an optical disk or video cassette recorder 
(VCR). Video signals output from the video playback 
apparatus 10 are successively converted by an A/D 
converter 3 into digital image data, which is then sup- 
plied to the computer 4. In the computer 4, the digital 
image data is entered through an interface 8 into a 
memory 9 and processed by a central processing unit 
(CPU) 7 according to a program stored in the memory 9. 
When the frames of video handled by the video play- 
back apparatus 10 are successively assigned with 
frame numbers, beginning with the head of the video, 
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the video of a desired scene is replayed by sending the 
corresponding frame number to the video playback 
apparatus 10 through a control line 2. A variety of infor- 
mation may be stored in an external information storage 
6, as required by the processing. The memory 9 stores 
various data generated by the processing described 
below and which is referenced. as necessary. 

First, described in detail will be the point of change 
detection processing, which can prevent erroneous 
detection of an instantaneous image disturbance as a 
point of change during the process of detecting a point 
of change between video shots in a video. 

Fig. 2 is an example of a flowchart of a computer 
program for detecting a point of change between video 
shots in a video when the computer program is exe- 
cuted in the system shown in Fig. 1. The computer pro- 
gram is stored in the memory 9 and specifically is 
executed by the CPU 9. The CPU 7, when executing 
sections of code of the computer program, first as an ini- 
tializing process, initializes a variable n representing the 
frame number of a shot currently being processed to 
reset the memory area used for histogram (step 200). 
The initial value for n is given a start frame number of 
the video interval to be processed. Next, the CPU 7 
takes in a frame image fn of the frame number n (step 
202) and generates a color histogram Hn for the frame 
image fn (step 204). The color histogram determines 
the frequency that the pixel representing the same color 
appears in one whole frame image. For example, when 
a 64-color histogram with red (R), green (G) and blue 
(B) assigned with two bits each is to be generated, the 
RGB color values of individual pixels of the frame are 
reduced to 6-bit 64 colors represented by only higher- 
order two bits of R, G and B, and for each of the 64 
colors the number of pixels representing the cola after 
reduction is counted. In this case, the color histogram is 
represented by arrangement Hn(i) where i takes an inte- 
ger from 0 to 63. For example, the frequency Hn(0) 
when i is 0 represents how many pixels there are in the 
frame whose higher-order two bits of the RGB color val- 
ues are zero for all R, G and B. Next the CPU 7 deter- 
mines a difference R1 n between the color histogram Hn 
and the color histogram Hn-1 of a frame fn-1 one frame 
before the current frame fn (step 206). Further, the CPU 
7 calculates the difference R2n between Hn and the 
color histogram Hn-2 two frames before the current 
frame fn (step 208). The differences R1n and R2n 
between histograms can be determined from a calcula- 
tion such as the x 2 test. Various kinds of calculations are 
described in the above-mentioned literature and hence 
a detailed explanation is omitted here. 

Fig. 3 schematically illustrates a change with time 
of frame images when an instantaneous image distur- 
bance occurs. There is a disturbance in frame fn-1 . tn 
this case, the difference between frames fn-2 and fn-1 is 
large and R1n-1 exhibits a large value. Further, the dif- 
ference between frames fn-1 and fn is large and R1n 
shows a large value. However, frames fn-2 and fn are 
similar and R2n assumes a small value. 



Fig. 4 schematically illustrates a change with time in 
frame images including a change in video shots. The 
changes in video shots has occurred between frames 
fn-2 and fn-1. At this time, the difference between 

5 frames fn-2 and fn-1 is large and R1n-1 exhibits a large 
value. Frames fn-1 and fn, however, are similar and R1 n 
assumes a small value. The difference between frames 
fn-2 and fn is large and R2n takes a large value. The 
conventional method focuses on only the value of R1n 

w and thus cannot distinguish between the cases of Figs. 
3 and 4, namely detecting the point between fn-2 and 
fn-1 as the point of change between video shots. By 
using R2n in making the decision, it is possible to distin- 
guish between Figs. 3 and 4. That is, if R1n-1 and R1n 

15 are both larger than a threshold th4 and R2n is smaller 
than a threshold th5, this is judged as an instantaneous 
disturbance (step 210). If R1n-1 is greater than a 
threshold th1, R1n is smaller than a threshold th2 and 
R2n is larger than a threshold th3, it is decided that 

20 there is a change in video shots between frames fn-2 
and fn-1 (step 214), and the processing associated with 
the detection of the point of change between video 
shots is performed (step 216). 

Of course, by using only the conditions of (1) 

25 whether R1n-1 and R1n are both large and (2) whether 
R1n-1 is large and R1n is small, the point of change 
between video shots may be located, i.e., the condition 
of (1) may be recognized as Fig. 3 and the condition of 
(2) as Fig. 4. The additional use of the R2n value makes 

30 the decision very reliable because it can confirm the 
return of the image to normal after disturbance. If the 
CPU 7 determines via step 210 the detected change to 
be an instantaneous disturbance. R1n-1 and R1n are 
reset to zero (step 212). If R1n-1 and R1n are not reset, 

3$ a resulting combination in which R1n is large, R1n+1 is 
small and R2n+1 is large, as shown in Fig. 3, causes the 
CPU 7 via the step 214 in the next n+ 1st frame process- 
ing to detect the point between frames fn-1 and fn as a 
change in video shots. 

40 The above-described processing makes it possible 
to reliably detect a point of change between video shots 
even when there are instantaneous disturbances in 
video images. By performing the above described 
processing a part where instantaneous disturbance has 

45 occurred can also be located. The most typical case of 
instantaneous disturbance is when a camera strobe 
flashes as in a press conference. Because a strobe 
flashing occurs when a newsman thinks the scene is 
important, the part of the video where the flash occurs 

so is often an important scene. Hence, this may be used as 
one of the means to pick up important scenes from the 
video. Another typical case of instantaneous distur- 
bance is an intentional one used to have a psychologi- 
cal effect on viewers, such as illegal subliminal 

55 advertising on television. This is said to be able to act 
subliminalty on the consciousness of viewers without 
them being aware of it by inserting one frame of a spe- 
cial image in the video at certain intervals, thereby 
bringing them under a sort of hypnotism. The above- 
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described processing can also be used to automatically 
check if the video contains any such unlawful effects 
and thereby prevent broadcasting of such a video. 

The above describes a situation where the distur- 
bance lasts only one frame for simplicity. By setting the s 
frame interval long in calculating R2n, it is possible to 
deal with disturbances lasting any number of frames in 
a similar way. In the example illustrated in Fig. 5, there 
are disturbances in two frames fn-1 and fn. R2n+1 that 
determines the difference between frames on both w 
sides of these disturbances assumes a small value, 
while R1n-1, R1n and R1n+1 take relatively large val- 
ues. From this combination of frame-to-frame differ- 
ences, it is possible to decide if the detected change is 
a disturbance in image or a point of change. In this way, is 
by successively increasing the interval or the number of 
frames to be skipped in calculating R2n, as from one- 
frame jump to two-frame jump, it is possible to distin- 
guish a disturbance lasting an arbitrary number of 
frames from a point of change between video shots. 20 

After the above-described processing, the CPU 7 
increments by one the frame number to be processed 
and prepares to take in the next frame (step 218). 

The computer program, executed by the CPU 7 so 
as to perform the point of change detection method 2s 
illustrated in Fig. 2, includes various code sections cor- 
responding to the steps. The code sections are written 
in a computer programming language executable by the 
CPU 7. For example, the code sections corresponding 
to the steps 200-214 and 218 can be written in a high 30 
speed, low level computer programming language such 
as C or assembly language. The code section corre- 
sponding to step 21 6 can be written in a low speed, high 
level computer programming language such as BASIC 
or some other interpretive language. 35 

Next, a detailed description will be provided of a 
point of change detection processing, which flexibly 
responds to changes in the characteristic of the object 
video to correctly detect a point of change between 
video shots, as by suppressing overly sensitive detec- 40 
tion of a change in a bright scene and preventing ^fail- 
ure to detect a change in a dark scene. 

Fig. 6 illustrates an example flowchart of a compu- 
ter program for detecting a point of change between 
video shots in a video when the computer program is 45 
executed in the system shown in Fig. 1 . The basic flow 
ol the computer program is similar to the previous flow* 
chart of Fig. 2, except that the threshold for determining 
the point of change between video shots in a video is 
variable in Fig. 6 while that of Fig. 2 is fixed. The advan- so 
tage of using a variable threshold is explained by refer- 
ring to Fig. 7. Fig. 7 takes for example a video consisting 
of three shots in a row, namely a bright scene with 
active motions 708, a dark scene 1 710, and a dark 
scene 2 712 and shows a typical change with time of the ss 
correlation coefficient between frames. In Fig. 7, 702 
and 704 represent the positions where the shot 
changes. 



In a scene with active motions, the image difference 
between succeeding frames is large. Hence, in the 
interval of 708 the correlation coefficient continues to 
have relatively large values. In the dark scene, on the 
other hand, colors making up the frame image fall into 
only a few kinds centering around black, so that the 
shapes of histograms are similar and the correlation 
coefficients in the interval of 710 and 712 assume small 
values. In these intervals, the correlation coefficient is 
relatively small even at the point of change in video 
shots for the above reason. Where the threshold for 
determining a change in video shots is fixed at a con* 
stant value as shown by a broken line of 706, if the 
threshold is set relatively high, the point of change such 
as 704 in a dark scene may fail to be detected and if on 
the contrary the threshold is set relatively low. a part of 
the correlation coefficients like 700 in the interval where 
there is an active movement may result in an overly sen- 
sitive response. 

As a method of suppressing an overactive detection 
in a scene with active motions, the above-mentioned 
Japan Patent Latd-Open No. 111181/1992 proposes a 
method that calculates the rate of change between the 
immediately preceding correlation coefficient and the 
present correlation coefficient and, when the rate of 
change exceeds an allowable value, decides that the 
detected change represents the change in video shots. 
In this method, the value of the correlation coefficient 
represents the amount of change in an image per unit 
time and thus corresponds to the moving speed of the 
camera or an object in the video. In many cases, 
because the moving speed of the camera or an object in 
the video does not change so greatly, the rate of change 
assumes a small value whether or not the scene has a 
motion, with only the value at the change of shot rising 
sharply. Hence, it is possible to use the same threshold 
for dark scenes and for scenes with active motions. 

There are, however, numerous cases where the 
image sharply changes as when the camera is moved 
or an object passes in front of the camera. In such 
cases, the rate of change 71 4 of 700 with respect to the 
immediately preceding correlation coefficient may be 
equal to or larger than the rate of change 716 in a dark 
scene, ft is therefore necessary to deal flexibly with 
changes in the characteristic in each scene of video and 
thereby change the threshold in response to the charac- 
teristic change. 

The CPU 7 when executing the computer program 
represented by the flowchart illustrated in Fig. 6, will 
now be explained. 

First, as an initializing process, the CPU 7 when 
executing the computer program, sets the variable n 
representing the frame number of the current shot to be 
processed to an initial value to reset the memory area 
used in the histogram (step 600). The initial value for n 
is given a start frame number of the video interval to be 
processed. Next, the CPU 7 when executing the compu- 
ter program takes in a frame image fn of the frame 
number n (step 602) and generates a color histogram 
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Hn for the frame image fn (step 604). This is followed by 
the CPU 7 determining the average brightness level Bn 
of the frame Fn (step 606). A detailed flowchart of the 
step 606 is illustrated in Fig. 8. 

As per Fig. 8, suppose that the frame image fn has s 
a width w and a height h. First, the variable Bn is reset 
to 0 (step 800). Then, a brightness level Bp(x, y) is 
determined for each pixel in the frame Fn, where (x, y) 
represents a two-dimensional plane coordinates of a 
pixel in the frame fn. Bp(x, y) represents a brightness of w 
a pixel when frame fn is a monochromatic image and, 
when it is a color image, a luminance component of a 
color of the pixel. If an image is represented by a system 
in which the luminance component is separated, as in a 
YUV color system, the value of the luminance compo- is 
nent may be used as is, i.e., in the YUV color system, 
the value of Y is used. In the RGB color system com- 
monly employed in computers, it is necessary to calcu- 
late the luminance component from the value of each 
RGB component. In the example indicated by, accord- 20 
ing to step 802, the maximum value of each RGB com- 
ponent is taken as the luminance value. R(x, y) 
represents a red component of a pixel located at a posi- 
tion (x, y); and similarly, G(x, y) represents a green com- 
ponent and B(x, y) represents a blue component 2s 
Because the luminance of a pixel is almost proportional 
to the green component, the value of the green compo- 
nent may be used approximately as the value of Bp(x, 
y). The value of Bp(x, y) determined in this way is added 
to Bn (step 804) to obtain the ultimate luminance Bn of 30 
the entire frame. The above processing may be per- 
formed simultaneously with the calculation of the color 
histogram of step 604 to eliminate the repetition of 
processing such as reading of value of each pixel and 
incrementing of variables x and y, thus speeding up the 35 
processing. 

Next the CPU 7 calculates the difference R1n 
between the histogram Hn-1 of immediately preceding 
frame and the histogram Hn of the current frame fn 
(step 608). Next, the CPU 7 changes the threshold 40 
according to the feature of the immediately preceding 
video image (step 610). In this example, the brightness 
and the motion of the immediately preceding video 
image are used as feature quantities, and according to 
an equation th7 = th6 + Bn-1*a + R1n-1*p , a new 45 
threshold th7 is obtained. The threshold the is given a 
value that permits best processing for a video image 
with a standard level of brightness and a small motion, 
a is a weight value indicating by how much the threshold 
th6 is varied up or down for a change in brightness; and so 
p is a weight value that determines by how much the 
threshold th6 is varied for a certain magnitude of 
motion, tf there are other factors that require the thresh- 
old to be changed, any necessary number of factors are 
added to the equation. ss 

For instance, the number of frames to be processed 
for the immediately preceding video image is one of the 
important factors. There are 30 frames per second for 
the video image of NTSC system. Depending on the 



capability of the computer, the timing of taking in the 
next frame may pass while calculating the correlation 
coefficient between frames. In this case, if the video 
image is replayed at an ordinary speed, the frame-to- 
frame correlation coefficient to be calculated is of 
course the one with one frame skipped. That is, the 
number of frames to be processed may be 30 frames 
per second or several frames per second depending on 
the processing capability of the computer. 

The wider the interval of frames to be compared, 
the greater the image difference will be. so that the 
value of the correlation coefficient tends to increase as 
the number of frames to be processed in the interval 
decreases. Therefore, in such an interval, an overactive 
detection can better be prevented by setting the thresh- 
old relatively high. When the number of frames to be 
processed does not change greatly while the program is 
being executed, if the computer's capability is known, 
the threshold th6 may be changed at the initial stage to 
eliminate the need for considering changing the th6 at 
the time of calculating the th7. 

Trie CPU 7 checks to see if the R1 n is greater than 
the th7 calculated as described above (step 612). If it is, 
the system decides that there is a change in video shots 
between n-1th frame and nth frame, and performs the 
change detection processing (step 614). Finally, the 
CPU 7 increments by one the frame number to be proc- 
essed and prepares for taking in the next frame (step 
616). 

The computer program represented by the flow- 
chart illustrated in Fig. 6 includes various code sections 
corresponding to the steps. The code sections are writ- 
ten in a computer programming language executable by 
the CPU 7. For example, the code sections correspond- 
ing to steps 600 - 612 and 616 can be written in a high 
speed, low-level computer programming language such 
as C or assembly language. The code section corre- 
sponding to step 61 4 can be written in a low speed, high 
level computer programming language such as BASIC 
or some other interpretive language. 

While in the above described processing, the 
threshold value is changed according to the feature of 
only one frame immediately preceding frame fn, it is 
possible to change it based on the history of the feature 
of an arbitrary number of immediately preceding 
frames. For example, as a measure of intensity of 
motion, the average value and maximum value of the 
correlation coefficient for several preceding frames may 
be used in stead of R1n-1 in the equation of step 610. 
The threshold for R1n may also be changed based on 
the features of the frames fn+1 , fn+2 t ... immediately fol- 
lowing the fn. With this method, when the R1n exceeds 
the threshold, it is decided retrospectively that there 
was a change in shot between fn-1 and fn. 

Further, while the above described processing 
changes the threshold, it is possible to fix the threshold 
at a constant value and instead change the value of the 
correlation coefficient. 
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Further, by combining the processing of Fig. 2 and 
the processing of Fig. 6, it is possible to realize a more 
reliable point of change detection processing for video. 
Fig. 9 illustrates a flowchart of a computer program 
embodying such a combination. The contents of proc- 
esses making up the flowchart illustrated in Fig. 9 are 
already described above. 

First, the CPU 7 as an initializing processing, sets 
the variable n representing the frame number of the cur- 
rent shot to be processed to an initial value to reset the 
memory area used for histogram (step 900). The initial 
value for n is given a start frame number of the video 
interval to be processed. Next, the CPU 7 takes in a 
frame image fn of. the frame number n (step 902) and 
generates a color histogram Hn for the frame image fn 
(step 904). Then, the CPU 7 calculates the average 
brightness Bn (step 906), followed by a determination of 
the difference R1n between the color histogram Hn and 
the color histogram Hn-1 of the immediately preceding 
frame fn-1 (step 908). Further, calculation is performed 
to determine the difference R2n between Hn and the 
color histogram Hn-2 of the frame fn-2 two frames 
before (step 910). Next, the CPU 7 calculates the 
threshold th8 according to the feature of the immedi- 
ately preceding video image (step 912). When R1n-1 
and R1n are both greater than the threshold th4 and 
R2n is smaller than the threshold th5, this is taken to 
mean that an instantaneous disturbance has occurred 
(step 914). Then, when R1n-1 is greater than the 
threshold th8 t R1 n is smaller than the threshold th2, and 
R2n is greater than the threshold th3, it is decided that 
there is a change in video shots between frames fn-2 
and fn-1 (step 918), and a variety of processes associ- 
ated with the detection of change are performed (step 
920). When the step 914 decides that the detected 
change is an instantaneous disturbance, R1n-1 and 
R1n are reset to zero (step 916). In the last step the 
CPU 7 increments by one the frame number to be proc- . 
essed and prepares for taking in the next frame (step 
922). 

Similar to the computer programs represented by 
the flowcharts illustrated in Figs. 2 and 6. the computer 
program represented by the flowchart illustrated in Fig. 
9 includes various code sections corresponding to the 
steps of the flowchart. Particularly, the code sections 
corresponding to steps 900 -918 and 922 can be writ- 
ten in a high speed, low level computing programming 
language such as C or assembly language. The code 
section corresponding to step 920 can be written in a 
low-speed, high level computer programming language 
such as BASIC or some other interpretive language. 

In the above described point of change detection 
processings for video, the correlation coefficient may be 
determined, as described in the literature cited above, 
by dividing the frame image into blocks, calculating the 
histogram for each divided block to determine the corre- 
lation coefficient for each block, and making an overall 
decision on the combination of these correlation coeffi- 
cients to determine the correlation coefficient of the 



entire frame. This procedure produces an effect of 
increasing the difference in the correlation coefficient 
value between the point of change in video shots and 
other intervals. 

5 In addition to detecting a point of change between 
video shots by using the above-described point of 
change detection processings for video, the user may 
wish to add and register a point of change by watching 
the video image being processed and specifying an 

10 appropriate location of change. Because of the limited 
response speed of humans, a significant length of time 
often passes from the moment the user has found an 
appropriate location of change and decided to register it 
until the user specification is conveyed to the computer. 

is Because of this time delay, the specified location of 
change may intolerably deviate from the originally 
intended location of change. Hence, it might be conven- 
ient if apparatus is provided which registers a location of 
change by subtracting a preset time interval according 

so to the response speed of the user from the originally 
specified location. Another apparatus may also be use- 
ful which, contrary to the above described apparatus, 
registers a location of change by adding a preset time 
interval to the original location. 

25 According to the point of change detected, the 
video is divided into part intervals. If the division is made 
irrespective of voice, later editing of the video may 
become difficult. For example, if an interval is divided 
while a person in the video is talking, the talk will not 

30 make sense when only one divided part is seen. This 
poses a problem particularly in a video edit system that 
selects and rearranges individual shots divided by the 
detected points of change. Hence, when a point of 
change is detected, the voice signal is also checked. If 

35 this part of video has voice, the registration of change is 
not made until the voice is ended, i.e., a silent part is 
reached, at which time the point of change is registered. 
Whether the part being checked has voice or not can be 
determined from sound volume. 

40 Rg. 10 shows a typical example of a voice signal. 
The abscissa represents time and the ordinate repre- 
sents the amplitude of sound. Because the sound vol- 
ume is the amplitude of sound, when the amplitude is 
smaller than a preset threshold, it can be treated as a 

45 silent part. There may be an instantaneous small ampli- 
tude even in a voiced part. To prevent erroneous deci- 
sion, the sound amplitude is checked for a 
predetermined length of time to confirm that the small- 
amplitude state continues, before detecting the silent 

so part. This procedure is applicable either to the case 
where the computer automatically detects a point of 
change or to a case where the user makes his own deci- 
sion in detecting a point of change. 

By dividing the video into shots according to the 

55 methods described above, search and edit can be done 
on the basis of the shots. For the user to be able to per- 
form video manipulation in units of shots more easily, it 
would be convenient if the contents of each shot can be 
identified by a picture. When a point of change between 
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shots is detected, the frame image at that instant is 
extracted as a representative image, whose size is 
changed to an appropriate one for search and edit appli- 
cations and which is stored as a file in a storage such as 
disk so that it can be retrieved as required. Processing 5 
associated with such storing operation is performed, for 
example, in step 216 of Fig. 2.. 

In addition to the still picture of the frame image, it 
is possible to use as a representative image a moving 
image of a specified length of time starting from the w 
detected point of change. As a representative image for 
the shot, it is often appropriate to use a picture some 
time after the point of change rather than a picture just 
at the point of change. It is therefore suited to use as the 
representative picture an image that is a specified time is 
offset from the point of change. If the point of change 
detection processings described in Figs. 2 and 9 are 
used, however, because only after several frames have 
passed from the point of change, can it be decided 
whether or not the detected change is really the point of 20 
change between shots, an attempt to extract a repre- 
sentative image when the decision is made will take in 
an image that is already a predetermined time offset 
from the point of change. In this case, if the user wants 
to pick up a frame image at the exact point of change 25 
between video shots, it may be convenient if several 
preceding frames are stored in a buffer. The buffer 
stores the latest frames and addition of one frame 
erases the oldest one in the buffer. 

In search and edit operations, it is necessary that 30 
where in the original video recorded on video tape a 
shot identified by a representative image is located be 
able to be immediately known. For this purpose, appa- 
ratus is provided which stores the frame number and 
time code of the original video tape in connection with 35 
the representative image. 

Other necessary information includes the duration 
of a shot, and actual time and date when the shot was . 
broadcast on television. The time and date can easily 
be determined by reading the clock built into the compu- 40 
ter, and the duration of shot can be calculated real time 
as a difference between times or frame numbers of two 
adjacent points of change. This information is also 
stored in connection with the representative image 
when the latter is stored. 45 

Attribute information, that the user added to each 
representative image as necessary is also related to the 
representative image when they are stored. Under the 
file management system for the general disk operating 
system (DOS), for example, the representative image so 
and its associated information may be related with each 
other by assigning the associated information to a file of 
the same file name as the representative image but with 
a different extension. In more concrete terms, the repre- 
sentative image may be stored in a CUT0OO0 1 . IMG. the ss 
associated time in a CUT00001 TIM, and the duration in 
a CUT00001 .DUR. Because the video consists of a plu- 
rality of shots, however, this procedure results in the 
number of files becoming too large, making the man- 



agement difficult. It is therefore possible to manage 
these information in a single file. 

Fig. 1 1 illustrates an example of a file structure for 
use in the present invention. 1100 is a header table 
which stores information related to the entire file, such 
as an identifier to discriminate against other file format 
and the total number of registered shots. A representa- 
tive image storage address table 1102 successively 
stores the same number of offset values as the total 
number of shots, each offset value representing from 
which position in the file the data of the representative 
image of a shot is stored. Similarly, 1 104 is a time code 
table and 1106 an attribute information table. When 
other related information is stored, necessary tables are 
prepared. These tables are so arranged that address 
information stored at the same locations in these tables 
from the head of the tables are related to the same rep- 
resentative image and are correlated to each other. 
1 1 08 to 1 1 1 8 are data areas to store respective kinds of 
information. 

Fig. 12 illustrates an example screen of a video edit 
system that can locate the start of each shot of a video 
or a movie. 1 is the display, 1232 a speaker producing 
voice and background music, 5 an indirect pointing 
device such as mouse or a joystick, 1234 a keyboard, 
and 1 230 a direct pointing device such as a touch panel. 

A monitor window 1200 in the display 1 has an 
operation panel 1202 of the same type as a video tape 
recorder (VTR) and allows the user to freely replay a 
video. The video displayed on the monitor screen corre- 
sponds to text in a book, and the panel (button) opera- 
tion corresponds to page turning. A lower right window 
1208 displays a list of representative scenes for the 
shots, and a center right window 1212 shows a list of 
subjects that appear on the video. These lists are gen- 
erally called indices. The list of scenes in the window 
1 208 is prepared by selecting typical frame images from 
a variety of scenes in the video through the point of 
change detection method, reducing the selected scenes 
in size and arranging them in chronological order as 
icons. These pictures may be considered as headings 
of scenes and correspond to the table of contents in a 
book. Each subject is one of the important constituent 
elements of a scene and in this respect, corresponds to 
a key word in a text. Hence, the list of subjects in the 
window 1212 corresponds to indices. 

When the icon 1210 on the list of scenes is mouse- 
clicked, the video on the monitor screen is switched to a 
scene represented by the clicked icon. The list of sub- 
jects consists of icons 1214 indicating what the subject 
is and a display duration graph (bar graph) 1216 on the 
right side of the icons. The duration display graph (bar 
graph) indicates the time during which the subject 
appears in the video, with the left end of the bar repre- 
senting the start of the video and the right end repre- 
senting the end of the video. Clicking on the bar displays 
the video of that time interval on the monitor screen. 
1204 is a cursor that moves according to the movement 
of the pointing device such as mouse. A window 1206 is 
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a general input-output window to display a variety of 
information related to the video. By using such a graph- 
ical user interface, a user-friendly video editing is real- 
ized. 

The present invention can be incorporated as a 
function of a video edit program into high-end systems 
for broadcasting stations, and into workstations and 
personal computers. Further, the present invention can 
be realized as one function of electronic devices such 
as VTR and TV and can also be developed into various 
equipment and systems that realize video on demand 
(VOD). 

By use of the present invention, due to the ability to 
distinguish between an instantaneous image distur- 
bance and a point of change between video shots, it is 
possible to prevent erroneous detection of a part having 
disturbances as a point of change between video shots 
and also to pick up only the part with the disturbance. In 
the present invention since the threshold for detecting a 
point of change can be dynamically changed according 
to the features of the immediately preceding video 
frame, it is possible to make a precise detection of a 
point of change according to the features of the video 
image, thus preventing erroneous detections. 

While the present invention has been described in 
detail and pictorially in the accompanying drawings it is 
not limited to such details since many changes and 
modifications recognizable to those of ordinary skill in 
the art may be made to the invention without departing 
from the spirit and the scope thereof. 

Claims 

1. A method of detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said method comprising the 
steps of: 

inputting a video chronologically one frame 
at a time into a processing device; 

calculating a feature quantity of video image 
data for each frame; 

determining a first correlation coefficient 
between a feature quantity of a current frame and a 
feature quantity calculated from an immediately 
preceding frame and determining a second correla- 
tion coefficient between the feature quantity of the 
current frame and a feature quantity of at least two 
frames preceding said current frame; and 

indicating a point of change between video 
shots a time when the first correlation coefficient 
and the second correlation coefficient are out of 
predetermined allowable ranges. 

2. A method of detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said method comprising the 
steps of: 

inputting a video chronologically one frame 
at a time into a processing device; 



calculating a feature quantity of video image 
data for each frame; 

determining a correlation coefficient 
between a feature quantity of a current frame and a 
s feature quantity calculated from an immediately 
preceding frame; and 

indicating a point of change between video 
shots when a correlation coefficient of the feature 
quantity of an immediately preceding frame or an 
w immediately following frame changes out of an 
allowable range. 

3. The method of claim 1 or 2, further comprising the 
steps of: 

is extracting a frame located at said point of 

change between video shots or a frame located at 
a position a predetermined offset from said point of 
change between video shots; 

resizing the frame to a preset size; and 

20 storing the resized frame in a storage device 

or a medium. 

4. The method of claim 3, further comprising the steps 
of: 

25 storing the resized frame in a storage device 

or medium by relating it to at least one of four sets 
of associated information including first information 
representing a position of the point of change in 
video, second information representing a time 

30 when the point of change occurred, third informa- 
tion representing a distance or time between the 
point of change and an immediately preceding point 
of change, and fourth information describing 
attributes of the point of change or of a shot begin- 

35 ning with the point of change. 

5. A system for detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said system comprising: 

40 video playback apparatus for playing a video 

chronologically one frame at a time; 

a display for displaying said video; and 
a processing device for calculating a feature 
quantity of video image data for each frame, deter- 
45 mining a first correlation coefficient between a fea- 
ture quantity of a current frame and a feature 
quantity calculated from an immediately preceding 
frame and determining a second correlation coeffi- 
cient between the feature quantity of the current 
so frame and a feature quantity of at least two frames 
preceding said current frame, and indicating on 
said display a point of change between video shots 
when the first correlation coefficient and the second 
correlation coefficient are out of predetermined 
55 allowable ranges. 

6. A system for detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said system comprising: 



40 



45 



50 



9 



MttDOCID: <£P_07291 1 7A1 JL> 



17 



EP0 729117 A1 



18 



a video playback apparatus for playing a 
video chronologically one frame at a time; 

a display for displaying said video; 

a processing device for calculating a feature 
quantity of video image data for each frame, deter- 5 
mining a correlation coefficient between a feature 
quantity of a current frame and a feature quantity 
calculated from an immediately preceding frame, 
and indicating on said display a point of change 
between video shots when a correlation coefficient w 
of the feature quantity of an immediately preceding 
frame or an immediately following frame changes 
out of an allowable range. 

7. The system of claim 5 or 6, further comprising: is 

means for extracting a frame located at the 
point of change in shot or a frame located at a posi- 
tion a predetermined offset from the point of 
change; 

means for resizing the frame to a preset size; 20 

and 

means for storing the resized frame in a stor- 
age device or a medium. 

8. The system of claim 7, further comprising: 25 

means for storing the resized frame in a stor- 
age device or medium by relating it to at least one 
of four sets of associated information including first 
information representing a position of the point of 
change in video, second information representing a 30 
time when the point of change occurred, third infor- 
mation representing a distance or time between the 
point of change and an immediately preceding point 
of change, and fourth information describing 
attributes of the point of change or of a shot begin- 3s 
ning with the point of change. 

9. A computer program product for use with a compu- 
ter having a display, said computer being used to 
detect a point of change between video shots from 40 
a video consisting of a plurality of; succeeding 
frames, said computer program comprising: 

a computer readable medium with the com- 
puter program recorded thereon, the computer pro- 
gram comprises: ~ 45 

a first code section for inputting a video 
chronologically one frame at a time into a process- 
ing device. 

a second code section for calculating a fea- 
ture quantity of video image data for each frame, so 

a third code section for determining a first 
correlation coefficient between a feature quantity of 
a current frame and a feature quantity calculated 
from an immediately preceding frame and deter- 
mining a second correlation coefficient between the ss 
feature quantity of the current frame and a feature 
quantity of at least two frames preceding said cur- 
rent frame, and 

a fourth code section for indicating a point of 



change between video shots when the first correla- 
tion coefficient and the second correlation coeffi- 
cient are out of predetermined allowable ranges. 

10. A computer program product for use with a compu- 
ter having a display, said computer being used to 
detect a point of change between video shots from 
a video consisting of a plurality of succeeding 
frames, said computer program comprising: 

a logic circuit with the computer program 
recorded thereon, the computer program com- 
prises: 

a first code section for inputting a video 
chronologically one frame at a time into a process- 
ing device, 

a second code section for calculating a fea- 
ture quantity of video image data for each frame, 

a third code section for determining a first 
correlation coefficient between a feature quantity of 
a current frame and a feature quantity calculated 
from an immediately preceding frame and deter- 
mining a second correlation coefficient between the 
feature quantity of the current frame and a feature 
quantity of at least two frames preceding said cur- 
rent frame, and 

a fourth code section for indicating a point of 
change between video shots when the first correla- 
tion coefficient and the second correlation coeffi- 
cient are out of predetermined allowable ranges. 

11. The method of claim 1 or the computer program 
product of claim 9 or 1 0, 

wherein in indicating whether the first correlation 
coefficient and the second correlation coefficient 
are out of the predetermined allowable ranges, the 
predetermined allowable ranges are changed 
according to the feature quantity of an immediately 
preceding frame or an immediately following frame. 

1 2. A computer program product for use with a compu- 
ter having a display, said computer being used to 
detect a point of change between video shots from 
a video consisting of a plurality of succeeding 
frames, said computer program comprising: 

a computer readable medium with the com- 
puter recorded thereon, the computer program 
comprises: 

a first code section for inputting a video 
chronologically one frame at a time into a process- 
ing device, 

a second code section for calculating a fea- 
ture quantity of video image data for each frame, 

a third code section for determining a corre- 
lation coefficient between a feature quantity of a 
current frame and a feature quantity calculated 
from an immediately preceding frame, and 

a fourth code section for indicating a point of 
change between video shots when a correlation 
coefficient of the feature quantity of an immediately 
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preceding frame or an immediately following frame 
changes out of an allowable range. 

13. A computer program product for use with a compu- 
ter having a display, said computer being used to s 
detect a point of change between video shots from 

a video consisting of a plurality of succeeding 
frames, said computer program comprising: 

a logic circuit with the computer recorded 
thereon, the computer program comprises: io 

a first code section for inputting a video 
chronologically one frame at a time into a process- 
ing device, 

a second code section for calculating a fea- 
ture quantity of video image data for each frame, 1$ 

a third code section for determining a corre- 
lation coefficient between a feature quantity of a 
current frame and a feature quantity calculated 
from an immediately preceding frame, and 

a fourth code section for indicating a point of 20 
change between video shots when a correlation 
coefficient of the feature quantity of an immediately 
preceding frame or an immediately following frame 
changes out of an allowable range. 

25 

1 4. The method of claim 1 or 2 or the system of claim 5 
or 6 or the computer program product of any of 
claims 9 to 13, wherein said feature quantity is a 
color histogram. 

30 

1 5. The computer program product of any of claims 9 to 
13, further comprising: 

a fifth code section for extracting a frame 
located at the point of change in shot or a frame 
located at a position a predetermined offset from 35 
the point of change; 

a sixth code section for resizing the frame to 
a preset size; and 

a seventh code section for. storing the 
resized frame in a storage device or a medium. 40 

16. The computer program product of claim 15, further 
comprising: 

an eighth code section for storing the resized 
frame in a storage device or medium by relating it to 45 
at least one of four sets of associated information 
including first information representing a position of 
the point of change in video, second information 
representing a time when the point of change 
occurred, third information representing a distance so 
or time between the point of change and an imme- 
diately preceding point of change, and fourth infor- 
mation describing attributes of the point of change 
or of a shot beginning with the point of change. 

ss 

17. The method of claim 4 or the system of claim 8 or 
the computer program product of claim 1 6, wherein 
the video frames and their related information are 
managed in a single file. 



18. The method of claim 3 or the system of claim 7 or 
the computer program product of claim 1 5, 
wherein a buffer is used to store two or more latest 
frames or resized frames at all times so that when a 
point of change is detected in a preceding frame, 
the frame containing the point of change is 
extracted from the buffer. 

19. The method of claim 1 or 2 or the system of claim 5 
or 6 or the computer program product of any of 
claims 9 to 13, wherein the feature quantity of an 
immediately preceding frame or an immediately fol- 
lowing frame is an overall brightness level of at least 
an immediately preceding frame or at least an 
immediately following frame. 

20. The method of claim 1 or 2 or the system of claim 5 
or 6 or the computer program product of any of 
claims 9 to 13, wherein the feature quantity of an 
immediately preceding frame or an immediately fol- 
lowing frame is a correlation coefficient between 
the current frame and an immediately preceding or 
following frame. 

21. The method of claim 1 or 2 or the system of claim 5 
or 6 or the computer program product of any of 
claims 9 to 13, wherein the feature quantity of an 
immediately preceding frame or an immediately fol- 
lowing frame is the number of frames to be proc- 
essed per unit time for the immediately preceding 
or following video image. 

22. A method of detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said method comprising the 
steps of: 

permitting a user to manually locate a point 
of change between video shots; 

registering as said point of change between 
video shots a point a predetermined time before or 
after said point manually located by said user. 

23. A method of detecting a point of change between 
video shots from a video consisting of a plurality of 
succeeding frames, said method comprising the 
steps of: 

permitting a user to manually locate a point 
of change between video shots; and 

registering as a point of change between 
video in shots a voice terminating point closest to 
said point manually located by said user. 
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